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Abstract 

We examine an error-correcting coding framework in which each coded symbol is constrained to be a function 
of a fixed subset of the message symbols. With an eye toward distributed storage applications, we seek to design 
systematic codes with good minimum distance that can be decoded efficiently. On this note, we provide theoretical 
bounds on the minimum distance of such a code based on the coded symbol constraints. We refine these bounds in 
the case where we demand a systematic linear code. Finally, we provide conditions under which each of these bounds 
can be achieved by choosing our code to be a subcode of a Reed-Solomon code, allowing for efficient decoding. This 
problem has been considered in multisource multicast network error correction. The problem setup is also reminiscent 
of locally repairable codes. 

I. Introduction 

We consider a scenario in which we must encode s message symbols using a length n etTor-correcting code 
subject to a set of encoding constraints. Specifically, each coded symbol is a function of only a subset of the 
message symbols. This setup arises in various situations such as in the case of a sensor network in which each 
sensor can measure a certain subset of a set of parameters. The sensors would like to collectively encode the 
readings to allow for the possibility of measurement errors. Another scenario is one in which a client wishes to 
download data files from a set of servers, each of which stores information about a subset of the data files. The 
user should be able to recover all of the data even in the case when some of the file servers fail. Ideally, the user 
should also be able to download the files faster in the absence of server failures. To protect against errors, we 
would like the coded symbols to form an etTor-correcting code with reasonably high minimum distance. On the 
other hand, efficient download of data is permitted when the error-correcting code is of systematic form. Therefore, 
in this paper, we present an upper bound on the minimum distance of an error-correcting code when subjected 
to encoding constraints, reminiscent of the cut-set bounds presented in HI. In certain cases, we provide a code 
construction that achieves this bound. Eurthermore, we refine our bound in the case that we demand a systematic 
linear error-correcting code, and present a construction that achieves the bound. In both cases, the codes can be 
decoded efficiently due to the fact that our construction utilizes Reed-Solomon codes. 
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A. Prior Work 

The problem of constructing error-correcting codes with constrained encoding has been addressed by a variety 
of authors. Dau et al. ||2|, IS, iH considered the problem of finding linear MDS codes with constrained generator 
matrices. They have shown that, under certain assumptions, such codes exist over large enough finite fields, as well 
as over small fields in a special case. A similar problem known as the weakly secure data exchange problem was 
studied in Q,®. The problem deals with a set of users, each with a subset of messages, who are interested in 
broadcasting their information securely when an eavesdropper is present. In particular, the authors of 0 conjecture 
the existence of secure codes based on Reed-Solomon codes and present a randomized algorithm to produce them. 
The problem was also considered in the context of multisource multicast network coding in |[T|, Q, ID. In ||2l, the 
capacity region of a simple multiple access network with three sources is achieved using Reed-Solomon codes. An 
analogous result is derived in I® for general multicast networks with 3 sources using Gabidulin codes. 

There has been a recent line of work involving codes with local repairability properties, in which every parity 
symbol is a function of a predetermined set of data symbols 10, Eoi, im, im, 113, m, na, ma, ini. Another 
recent paper m represents code symbols as vertices of a partially connected graph. Each symbol is a function of 
its neighbors and, if erased, can be recovered from them. Our code also utilizes a graph structure, though only to 
describe the encoding procedure. There is not necessarily a notion of an individual code symbol being repairable 
from a designated local subset of the other code symbols. 

II. Problem Setup 

Consider a bipartite graph G = (A4,V,£) with s = \AA\ < |V| = n. The set £ is the set of edges of the 
graph, with {mi,Cj) & £ if and only if rrii & M is connected to Cj € V. This graph defines a code where 
the vertices A4 correspond to message symbols and the vertices V correspond to codeword symbols. A bipartite 
graph with s = 3 and n = 7 is depicted in figure [T] Thus, if each rrii and cj are assigned values in the finite 
field Fq with q elements, then our messages are the vectors m = (mi, ... ,ms) € F® and our codewords are the 
vectors c = (ci,..., c„) € F^. Each codeword symbol Cj will be a function of the message symbols to which it is 
connected, as we will now formalize. 

Henceforth, [c]i is the subvector of c with elements indexed by I C {1, and [A]j j is the (i, j)* element 

of a matrix A. Let N{cj) denote the neighborhood of Cj € V, i.e. M{cj) = {rrii G A4 : (rrii, cj) € £}. Similarly, 
define M{mi) = {cj : G £}. We will also consider neighborhoods of subsets of the vertex sets, i.e. for 

V C V, N{V') = Ucjev'A"(cj). The neighborhood of a subset of M is defined in a similar manner. Let rrii take 
values in F^ and associate with each Cj € V a function fj : F^ —> Fg. We restrict each fj to be a function of 
M{cj) only. Now consider the set C = {(ci, ..., Cn) '■ Cj = m G F^}. The set C is an error-correcting code 

of length n and size at most q^. We will denote the minimum distance of C as d{C). If we restrict fj to be linear, 
then we obtain a linear code with dimension at most s. 

The structure of the code’s generator matrix can be deduced from the graph G. Let gj G F®^^ be a column 
vector such that the entry is zero if rrii ^ Af{cj). Defining fj{A/{cj)) = mgj yields a linear function in which 
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is a function of Af{cj) only, as required. A concatenation of the vectors gj forms the following matrix: 


G = 


gl • • • gn 


( 1 ) 


where G G is the generator matrix of the code C. 

We associate with the bipartite graph G = (A4,V,£) an adjacency matrix A G {0,where [A]i j = 1 if 
and only if {rrii, cj) G E. For the example in figure [1] this matrix is equal to 


A = 


1 0 
1 1 
0 0 


0 1 
1 0 
1 1 


1 1 
1 1 
1 1 


1 

1 

1 


( 2 ) 


A valid generator matrix G (in generic form) is built from A by replacing non-zero entries with indeterminates. 
The choice of indeterminates (from a suitably-sized hnite field Fg) determines the dimension of the code and 
its minimum distance. For general linear codes, the Singleton bound (on minimum distance) is tight over large 
alphabets. In the presence of encoding constraints, the Singleton bound can be rather loose. In the next section, 
we derive an upper bound on the minimum distance of any code (linear or non-linear) associated with a bipartite 
graph. This bound is reminiscent of the cut-set bounds of Dikaliotis et al. in m. 


A. Subcodes of Reed-Solomon Codes 


Throughout this paper, we use the original definition of an [n, k]q Reed-Solomon code as in ||T91 . the /c-dimensional 
subspace of Fg given by Crs = {(m(ai),..., m(a„)) : deg {m{x)) < k}, where the m{x) are polynomials over Fg 
of degree deg {m{x)), and the G Fg are distinct (fixed) field elements. Each message vector m = (mo,..., mk-i) 
is mapped to a message polynomial m{x) = which is then evaluated at the n elements {ai, 02 , ■ • ■, O-n} 

of Fg, known as the dehning set of the code. Reed-Solomon codes are MDS codes; their minimum distance attains 
the Singleton bound, i.e. (((Crs) = n — k + 1. 

We can extract a subcode of a Reed-Solomon code that is valid for the bipartite graph G = (Ad, V, £) as follows: 
First, let Fg be a hnite held with cardinality q > n. Associate to each cj G V a distinct element aj G Fg. Consider the 
(* row of the adjacency matrix A of G, and let ti{x) = ny[A]i )■ example, t^^x) = (x — ai)(x — a 2 ) 

corresponds to the the third row of A in (l2]i. Choose k such that k > deg {ti{x)), V(. If G F^ is the (row) vector 
of coefficients of ti{x) and Grs is the generator matrix of a Reed-Solomon code with dehning set {ai,... ,a„} 
and dimension k, then t^GRs = {ti{ai ),..., (i(a„)) is a vector that is valid for the (* row of G, i.e. if [A]i_j = 0 
then [tiGRsjj = 0. A horizontal stacking of the vectors results in a transformation matrix T that will produce a 
valid generator matrix G from Grs: 


G = TGrs = 


ti 


ts 


a 


1 ••• 1 

(X\ * • • 

(fc-l) Ak-l) 

1 


( 3 ) 
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Fig. 1. A bipartite graph representing with 3 message symbols and 7 code symbols 


The rank of G will be equal to the rank of T, and the resulting code C will have a minimum distance d{C) that 
is determined by Crs- Indeed, d{C) > ^(Crs). 

III. Minimum Distance 

In this section, an upper bound on the minimum distance of a code defined by a bipartite graph G = (Ad, V,£) 
is derived. The bound closely resembles the cut-set bounds of lH. In most cases, this bound is tighter than the 
Singleton bound for a code of length n and dimension s. For each M' C M define hm' '■= This is the 

number of code symbols Cj in V that are a function of the information symbols Ad'. The following proposition 
characterizes the minimum distance of any code defined by G. 

Proposition 1. Fix a field F^. For any code C with \C\ = 9 ® defined by a fixed graph G = (Af, V, £), the minimum 
distance d{C) obeys 

d{C) <nM'-\M'\ + l, VAd'CAd. (4) 

Proof: Working toward a contradiction, suppose d{C) > ni—\I\ + l for some I C Ad. Let C be the encoding 
of all message vectors m where [m]ic £ F, ' has some arbitrary but fixed value. Note that [c]jv/(i)= is the same for 
all c £ C, since the symbols are a function of only. Since \X\ > nx — d{C) + 1, then by the pigeonhole 

principle there exist Ci, C 2 £ C such that, without loss of generality, the first nx — d{C) + 1 symbols of [ci]jv(i) 
and [c2]m(i) identical. Furthermore, [ci\xf(xY = Finally, since J\f{X) and partition V, we 

obtain (iH(ci,C 2 ) < n — {nx — d{C) + 1 + (n — nx)) = d{C) — 1, a contradiction. Figure |2] illustrates the relation 
between I and the corresponding partition of V. ■ 

As a direct corollary, we obtain the following upper bound on d{C): 

Corollary 1. 

d{C) < min — |Ad'|} + 1 (5) 

Our next task is to provide constructions of codes that achieve this bound. 
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X x^ 





Fig. 2. Partitions of Jv[ and of V used in the proof of proposition [T] The set M{X) is a function of both X and X'^, while the set M{XY is a 
function of X’^ only. 


IV. Systematic Construction 

In this section, we provide a code construction that achieves the minimum distance bound stated in corollary 
[1] We appeal to Hall’s Theorem, a well-known result in graph theory that establishes a necessary and sufficient 
condition for finding a matching in a bipartite graph. Some terminology needed from graph theory is defined in 
the following subsection. 

A. Graph Theory Preliminaries 

Let G = {S,T,S) be a bipartite graph. A matching is a subset £ C £ such that no two edges in £ share a 
common vertex. A vertex is said to be covered by £ if it is incident to an edge in £. An S-covering matching is 
one by which each vertex in S is covered. We will abuse terminology and say that an edge e S £" is unmatched if 
e ^ £. We can now state Hall’s Theorem. 

Theorem 1. Let G = {S, T, £) be a bipartite graph. There exists an S-covering matching if and only if jiS'l < f/{S') 
for all S' C S. 

For a proof of the theorem, see e.g. ll20l p.53]. 

Set dmin = rniiiM'CMitiM' — 1-^1} + 1- In order to construct a generator matrix G € ]F'sx'* for a code C with 
minimum distance dmin, we will use an [n,n — dmin + 1] Reed-Solomon code with generator matrix Grs. We will 
then extract C as a subcode using an appropriately built transformation matrix T to form G = TGrs such that G 
is in systematic form, which implies that the dimension of C is s. Since C is a subcode of a code with minimum 
distance dmin, we have d(C) > dmin- ® further implies that d(C) = dmin- 

Our construction is as follows: consider a graph G = {M, V, £) dehning C, and define the set A = {cj : M{cj) = 
A4}, i.e. A is the set of code symbols that are a function of every message symbol. Note that A C for every 

M' C Ai. Therefore, if a = \ A\ then the size of the neighborhood of Af(Ai') can be expressed as n^/ = rj^i + a, 
where tm’ is the cardinality of the set TZ{A4') = \ A. 

Theorem 2. Let G = {AA, V, £). Set dmin = ~ + 1 ‘^ttd fcmin = n — dmin + 1. A linear code 

C with parameters [n, s, dmin] valid for G can be constructed with a systematic-form generator matrix provided 
that kmm > I'M- 
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Proof: First, we establish a bound on a. Note that since n = um = tm + a fc^in > tmj then we have 
a > rfmin — 1- Fix an arbitrary subset ,4* C ^ of size a* = a — (dmin — 1), which is guaranteed to exist by virtue of 
the bound on a, and \&t B = A\ A*. Now, we focus on a particular subgraph of G defined by G* = (A4, V*,£*) 
where V* = V\B, and E* = {{mi,Cj) G £ : cj G V*} is the edge set corresponding to this subgraph. Since 
um' = 'T'M' + a. then from the definition of dmin we have 

\M'\<rM' +a-{d,r,-,u-l), VM'CM ( 6 ) 

The neighborhood of every subset A4' when restricted to V* is exactly = TZ{M') U ,4,*, with cardinality 

n^/ = r_Mr + a*. The bounds (l 6 ]l can now be expressed in a way suitable for the condition of Hall’s theorem; 

VM'CM (1) 

An Ad-covering matching in G* can be found by letting S = M and T = V* in theorem[T] Let £ = {(rrii, C 

£* be such a matching, and V the subset of V* that is covered by £. Let be the adjacency matrix of G when 
the edge set {{mi,Cj) G £ : Cj G V,j d(*)} removed. The number of zeros in any row of Ag is at most 
—dmin- To see this, note that the edges in £ incident to B are not removed by the matching, and every rrii G M is 
connected to at least one vertex in V*. Next, we build a valid G for G using A^, utilizing the method described in 
section Hh^ Fix a [n,n — dmin + 1] Reed-Solomon code with generator matrix Grs and defining set {ai,..., a„}. 
The transformation polynomial is ti{x) = Oj-fA-] =o(* ~ Since the number of zeros in any row of A^ 
is at most n — dmin, we have deg (ti{x)) < n — dmin = k — 1 for all i. We use the after normalizing by 

to construct a transformation matrix T and then G = TGrs is valid for G. Note that G is in systematic 
form due the fact that the columns of A^ indexed by {j(i)}f=i form a permutation of the identity matrix of size 
s. Lastly, d(C) = dmin since d(C) < dmin by corollary (|5]l, and d(C) > dmin since C is a subcode of a code with 
minimum distance dmin- ■ 

V. Minimum Distance for Systematic Linear Codes 

In this section, we will restrict our attention to the case where a code valid for G is linear, so that each Cj G V 
is a linear function of the message symbols rrii G JV{cj). We seek to answer the following; What is the greatest 
minimum distance attainable by a systematic linear code valid for G? 

Any systematic code must correspond to a matching £ C £ which identifies each message symbol rrii G M 
with a unique codeword symbol G V, where j{i) G {1, ■ • ■ ,ri}. Explicitly, £ consists of s edges of the form 
{{rrii, Cj{i))} for * = 1, • ■ ■, s such that ^ for A 7^ * 2 - As before, V is the subset of vertices in V which 
are involved in the matching; V = {cj(i)}f^i. Our code becomes systematic by setting Cj(j) = rrii for i = 1,..., s, 
and choosing each remaining codeword symbol Cj ^ V to be some linear function of its neighboring message 
symbols rrii G M{cj). 

Definition 1. For G = {M,V,£), let £ C £ be an M-covering matching so that £ = {(mi, Let 

^ ~ {oi(i)}i=i ke the vertices in V which are covered by £. Define the matched adjacency matrix Ag G {0, 1 }®^” 
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SO that = 1 if and only if either {mi,Cj) G £, or Cj f. V and (mi,Cj) G £. In other words, Ag is the 

adjacency matrix of the bipartite graph formed by starting with G and deleting the edges {(mi,Cj) G f : Cj G 

V and j f jif)}. 

Definition 2. Let £ G £ be a matching for the G = (Ai, V, £) which covers A4. Let Zg be the maximum number 
of zeros in any row of the corresponding matched adjacency matrix Ag, and define kg := Zg + 1. Furthermore, 
define ksys = mingkg where £ ranges over all matchings for G which cover A4, and dsys = n — ksys + 1- 

Lemma 1. For a given bipartite graph G = (Ai,V,£) which merits a matching that covers A4, we have 

^ ^min ^ ^sys ^ ^ (8) 

and 

dsys — ^min- (9) 

Proof: Let A be the adjacency matrix of G. 

For any subset Ai' C At we have c?mm < nj^i — \Ai'\ +1, and likewise fcmm = ri —rfmin +1 > |Af'| + {n — nM')- 
Taking Ai' = Ai (and noting that in our framework, every Cj G V is connected to at least one vertex in Ai, hence 
UM = n) we obtain kmm > s. 

Now choose a set Ai' for which the above relation holds with equality, that is, fcmm = |Af'| + (n — um')- 
Since J\f{Ai') is simply the union of the support sets of the rows of A corresponding to Ai', then each of these 
rows must have at least n — riyn' = \N{Ai'Y\ zeros. Furthermore, any matching £ which covers Ai must identify 
the rows of Ai' with columns of Af{Ai'). Thus, in the matched adjacency matrix Ag, the row corresponding to 
j G Ai' must have |A1'| — 1 zeros in the columns of Af{Ai) which are matched to Ai' \ {j}, in addition to the 
n — UM' zeros in the columns corresponding to M{Ai'). This gives us kg > \ Ai'\ + (n — um') for each matching 
£, hence ksys > fcmin- It follows directly that dsys < rfmin- Finally, it is clear from definition that for any Af-covering 
matching £ we must have that kg is less than the length of the adjacency matrix A, which is n, hence ksys <n.W 

Corollary 1. Let G = (Ai,V,£) be a bipartite graph which merits a systematic linear code. The largest minimum 
distance obtainable by a systematic linear code is dsys- 

Proof: Let C be a systematic linear code which is valid for G. Then C must have a codeword containing at 
least ksys — 1 zeros, i.e. a codeword of Hamming weight at most n — ksys + 1 = dsys- Since the code is linear, this 
Hamming weight is an upper bound for its minimum distance, so d{C) < dsys- 

It remains to see that there are systematic linear codes which are valid for G and achieve a minimum distance of 
dsys- Let £ be an AI-covering matching for G such that kg = ksys- Then for any k > ksys, we claim that an [n, k] 
Reed-Solomon code contains a systematic linear subcode that is valid for G. Indeed, choose a set of n distinct 
elements C as the defining set of our Reed-Solomon code. Then to form our subcode’s generator matrix 

G, note that (as mentioned before) G must have zero entries in the same positions as the zero entries of Ag, and 
indeterminate elements in the remaining positions. There are at most ksys — 1 zeros in any row of A^ (and at least 
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s — 1 zeros in each row, since there must be s columns which have nonzero entries in exactly one row). For each 
row i G {1,..., s} of Ag, let C {1,..., n} be the set of column indices j such that = 0. Then form the 

polynomial ti{x) = Ojei normalize by which accordingly has degree at most ksys (and at 

least s — 1). We now set the row of G to be {ti{ai),..., ti{an)), and we see that by construction this row has 
zeros precisely at the indices j G as desired. 

The rows of G generate a code with minimum distance at least that of the original Reed-Solomon code, which 
is n — k + 1. Furthermore, by setting k = k^ys for our Reed-Solomon code, we see this new code C has minimum 
distance at least n — ksys + 1 = dsys- Since by our previous argument, d{C) < dsys, the minimum distance of C must 
achieve dsys with equality. ■ 

VI. Achievability Using MDS Codes 

Throughout this paper, we have utilized Reed-Solomon codes to construct systematic linear codes valid for a 
particular G = (A4,V,£) that attain the highest possible distance. It is worth mentioning that this choice is not 
necessary and in fact, the Reed-Solomon code utilized can be replaced with any linear MDS code with the same 
parameters. 


Lemma 2. Fix an arbitrary [n, k\ linear MDS code C. For any X C [n] where \X\ < k — 1 , there exists c G C 
such that [c]i = 0. 

Proof: Let G = be the generator matrix of C and let Gx = [gi]igx- Since |X| < fc — 1, Gj has full 

column rank and so it has a non-trivial left nullspace of dimension k — |X|. If h is any vector in that nullspace 
then c = hG is such that [c]x = 0. ■ 


Therefore, to produce a valid linear code C for G = (Ai, V, £) with d{C) = d*, where d* < Urm for all rrii G M., 
we fix an arbitrary [n, n — d* + V\ MDS code and then select vectors hi,..., hs such that hi is in the left nullspace 
of Gxi, where X^ = {j : Ai,j = 0}. Note that the specific selection of the determines the dimension of C. For 
a systematic construction, in which the dimension of the code is guaranteed to be s, some extra care has to be 
taken when choosing the h^. We must choose each such that its not in the nullspace of gj(i), which the column 
corresponding to the systematic coordinate 


VII. Example 


In this section, we construct a systematic linear code that is valid for the graph in figure [T] The bound of corollary 
|5] asserts that d{C) < 5 for any C valid for G. However, lemma [T] shows that d{Csys) < 4 for any valid systematic 
linear code Csys- A matching achieving this bound is given by the edges £ = {{mi,vi), {m2,V2), {nisjVs)} and so 
the edges removed from the graph are {(m 2 ,fi), (to 2 ,'P 3 )}- The new adjacency matrix A^ is given by. 


Ax = 


1 0 

0 1 

0 0 


0 1 
0 0 
1 1 


1 1 
1 1 
1 1 


1 

1 

1 


( 10 ) 
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where boldface zeros refer to those edges removed from G because of the matching £. 

A generator matrix which is valid for can be constructed from that of a [7,4] Reed-Solomon code over Fy 
with defining set {0,1, a,, a®} where a is a primitive element in Fy, using the method described in IlI-AI 
The polynomials corresponding to the transformation matrix are given by, 


ti{x) 

= a^{x — l){x — a) 


(11) 

t2{x) 

= a^x{x — a){x — a^) 


(12) 

h{x) 

= a^x{x — 1) 


(13) 

Finally, the systematic generator matrix for Csys 

is, 




1 

0 0 ^5 



Gsys = 

0 

10 0 1 

1 

(14) 


0 

0 1 a® a® 

1 



VIII. Conclusion 

In this paper, we have studied the problem of analyzing and designing error-correcting codes when the encoding 
of every coded symbol is restricted to a subset of the message symbols. We obtain an upper bound on the minimum 
distance of any such code, similar to the cut-set bounds of m. By providing an explicit construction, we show 
that under certain assumptions this bound is achievable. Furthermore, the field size required for the construction 
scales linearly with the code length. The second bound is on the minimum distance of linear codes with encoding 
constraints when the generator matrix is required to be in systematic form. We provide a construction that always 
achieves this bound. Since all of our constructions are built as subcodes of Reed-Solomon codes, they can be 
decoded efficiently using standard Reed-Solomon decoders. For future work, it remains to show that the first upper 
bound is achievable in general over small fields. 
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